classification report
Mind Reading or Misreading? LLMs on the Big Five Personality Test
Di Cursi, Francesco, Boldrini, Chiara, Conti, Marco, Passarella, Andrea
We evaluate large language models (LLMs) for automatic personality prediction from text under the binary Five Factor Model (BIG5). Five models -- including GPT-4 and lightweight open-source alternatives -- are tested across three heterogeneous datasets (Essays, MyPersonality, Pandora) and two prompting strategies (minimal vs. enriched with linguistic and psychological cues). Enriched prompts reduce invalid outputs and improve class balance, but also introduce a systematic bias toward predicting trait presence. Performance varies substantially: Openness and Agreeableness are relatively easier to detect, while Extraversion and Neuroticism remain challenging. Although open-source models sometimes approach GPT-4 and prior benchmarks, no configuration yields consistently reliable predictions in zero-shot binary settings. Moreover, aggregate metrics such as accuracy and macro-F1 mask significant asymmetries, with per-class recall offering clearer diagnostic value. These findings show that current out-of-the-box LLMs are not yet suitable for APPT, and that careful coordination of prompt design, trait framing, and evaluation metrics is essential for interpretable results.
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Semantic Preprocessing for LLM-based Malware Analysis
Marais, Benjamin, Quertier, Tony, Barrue, Grégoire
In a context of malware analysis, numerous approaches rely on Artificial Intelligence to handle a large volume of data. However, these techniques focus on data view (images, sequences) and not on an expert's view. Noticing this issue, we propose a preprocessing that focuses on expert knowledge to improve malware semantic analysis and result interpretability. We propose a new preprocessing method which creates JSON reports for Portable Executable files. These reports gather features from both static and behavioral analysis, and incorporate packer signature detection, MITRE ATT\&CK and Malware Behavior Catalog (MBC) knowledge. The purpose of this preprocessing is to gather a semantic representation of binary files, understandable by malware analysts, and that can enhance AI models' explainability for malicious files analysis. Using this preprocessing to train a Large Language Model for Malware classification, we achieve a weighted-average F1-score of 0.94 on a complex dataset, representative of market reality.
- North America > United States > California > Santa Clara County > Santa Clara (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Europe > France (0.04)
Discovering Software Parallelization Points Using Deep Neural Networks
Correia, Izavan dos S., Santos, Henrique C. T., Ferreira, Tiago A. E.
This study proposes a deep learning-based approach for discovering loops in programming code according to their potential for parallelization. Two genetic algorithm-based code generators were developed to produce two distinct types of code: (i) independent loops, which are parallelizable, and (ii) ambiguous loops, whose dependencies are unclear, making them impossible to define if the loop is parallelizable or not. The generated code snippets were tokenized and preprocessed to ensure a robust dataset. Two deep learning models - a Deep Neural Network (DNN) and a Convolutional Neural Network (CNN) - were implemented to perform the classification. Based on 30 independent runs, a robust statistical analysis was employed to verify the expected performance of both models, DNN and CNN. The CNN showed a slightly higher mean performance, but the two models had a similar variability. Experiments with varying dataset sizes highlighted the importance of data diversity for model performance. These results demonstrate the feasibility of using deep learning to automate the identification of parallelizable structures in code, offering a promising tool for software optimization and performance improvement.
- South America > Brazil > Pernambuco > Recife (0.05)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- North America > United States > Massachusetts > Middlesex County > Reading (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.94)
Machine learning based animal emotion classification using audio signals
Slobodian, Mariia, Kozlenko, Mykola
Abstract--This paper presents the machine learning approach to the automated classification of a dog's emotional state based on the processing and recognition of audio signals. It offers helpful information for improving human-machine interfaces and developing more precise tools for classifying emotions from acoustic data. The presented model demonstrates an overall accuracy value above 70% for audio signals recorded for one dog. I.Introduction Scientists suggest that canines are far more intelligent than people realize. Over the years, there have been a lot of publications about studies focused on dogs.
- Europe > Ukraine > Ivano-Frankivsk Oblast > Ivano-Frankivs'k (0.07)
- North America > United States (0.05)
- Europe > Switzerland (0.05)
Real-world validation of a multimodal LLM-powered pipeline for High-Accuracy Clinical Trial Patient Matching leveraging EHR data
Callies, Anatole, Bodinier, Quentin, Ravaud, Philippe, Davarpanah, Kourosh
Background: Patient recruitment in clinical trials is hindered by complex eligibility criteria and labor-intensive chart reviews. Prior research using text-only models have struggled to address this problem in a reliable and scalable way due to (1) limited reasoning capabilities, (2) information loss from converting visual records to text, and (3) lack of a generic EHR integration to extract patient data. Methods: We introduce a broadly applicable, integration-free, LLM-powered pipeline that automates patient-trial matching using unprocessed documents extracted from EHRs. Our approach leverages (1) the new reasoning-LLM paradigm, enabling the assessment of even the most complex criteria, (2) visual capabilities of latest LLMs to interpret medical records without lossy image-to-text conversions, and (3) multimodal embeddings for efficient medical record search. The pipeline was validated on the n2c2 2018 cohort selection dataset (288 diabetic patients) and a real-world dataset composed of 485 patients from 30 different sites matched against 36 diverse trials. Results: On the n2c2 dataset, our method achieved a new state-of-the-art criterion-level accuracy of 93\%. In real-world trials, the pipeline yielded an accuracy of 87\%, undermined by the difficulty to replicate human decision-making when medical records lack sufficient information. Nevertheless, users were able to review overall eligibility in under 9 minutes per patient on average, representing an 80\% improvement over traditional manual chart reviews. Conclusion: This pipeline demonstrates robust performance in clinical trial patient matching without requiring custom integration with site systems or trial-specific tailoring, thereby enabling scalable deployment across sites seeking to leverage AI for patient matching.
- Europe > France > Île-de-France > Paris > Paris (0.04)
- North America > United States > Massachusetts (0.04)
- Europe > United Kingdom > England (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
A Comparative Study of Diabetes Prediction Based on Lifestyle Factors Using Machine Learning
Diabetes is a prevalent chronic disease with significant health and economic burdens worldwide. Early prediction and diagnosis can aid in effective management and prevention of complications. This study explores the use of machine learning models to predict diabetes based on lifestyle factors using data from the Behavioral Risk Factor Surveillance System (BRFSS) 2015 survey. The dataset consists of 21 lifestyle and health-related features, capturing aspects such as physical activity, diet, mental health, and socioeconomic status. Three classification models, Decision Tree, K-Nearest Neighbors (KNN), and Logistic Regression, are implemented and evaluated to determine their predictive performance. The models are trained and tested using a balanced dataset, and their performances are assessed based on accuracy, precision, recall, and F1-score. The results indicate that the Decision Tree, KNN, and Logistic Regression achieve an accuracy of 0.74, 0.72, and 0.75, respectively, with varying strengths in precision and recall. The findings highlight the potential of machine learning in diabetes prediction and suggest future improvements through feature selection and ensemble learning techniques.
- North America > United States > California (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > New Finding (0.94)
- Research Report > Experimental Study (0.61)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.62)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.58)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.51)
Learning Sign Language Representation using CNN LSTM, 3DCNN, CNN RNN LSTM and CCN TD
Louison, Nikita, Goodridge, Wayne, Khan, Koffka
Existing Sign Language Learning applications focus on the demonstration of the sign in the hope that the student will copy a sign correctly. In these cases, only a teacher can confirm that the sign was completed correctly, by reviewing a video captured manually. Sign Language Translation is a widely explored field in visual recognition. This paper seeks to explore the algorithms that will allow for real-time, video sign translation, and grading of sign language accuracy for new sign language users. This required algorithms capable of recognizing and processing spatial and temporal features. The aim of this paper is to evaluate and identify the best neural network algorithm that can facilitate a sign language tuition system of this nature. Modern popular algorithms including CNN and 3DCNN are compared on a dataset not yet explored, Trinidad and Tobago Sign Language as well as an American Sign Language dataset. The 3DCNN algorithm was found to be the best performing neural network algorithm from these systems with 91% accuracy in the TTSL dataset and 83% accuracy in the ASL dataset.
- North America > Trinidad and Tobago (0.25)
- Asia > Singapore (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
Predicting DNA fragmentation: A non-destructive analogue to chemical assays using machine learning
Jacobs, Byron A, Shaik, Ifthakaar, Lin, Frando
Globally, infertility rates are increasing, with 2.5\% of all births being assisted by in vitro fertilisation (IVF) in 2022. Male infertility is the cause for approximately half of these cases. The quality of sperm DNA has substantial impact on the success of IVF. The assessment of sperm DNA is traditionally done through chemical assays which render sperm cells ineligible for IVF. Many compounding factors lead to the population crisis, with fertility rates dropping globally in recent history. As such assisted reproductive technologies (ART) have been the focus of recent research efforts. Simultaneously, artificial intelligence has grown ubiquitous and is permeating more aspects of modern life. With the advent of state-of-the-art machine learning and its exceptional performance in many sectors, this work builds on these successes and proposes a novel framework for the prediction of sperm cell DNA fragmentation from images of unstained sperm. Rendering a predictive model which preserves sperm integrity and allows for optimal selection of sperm for IVF.
- North America > United States (0.68)
- Asia > Singapore > Central Region > Singapore (0.04)
- Africa > South Africa > Gauteng > Pretoria (0.04)
- (2 more...)
GPT-4 Generated Narratives of Life Events using a Structured Narrative Prompt: A Validation Study
Lynch, Christopher J., Jensen, Erik, Munro, Madison H., Zamponi, Virginia, Martinez, Joseph, O'Brien, Kevin, Feldhaus, Brandon, Smith, Katherine, Reinhold, Ann Marie, Gore, Ross
Large Language Models (LLMs) play a pivotal role in generating vast arrays of narratives, facilitating a systematic exploration of their effectiveness for communicating life events in narrative form. In this study, we employ a zero-shot structured narrative prompt to generate 24,000 narratives using OpenAI's GPT-4. From this dataset, we manually classify 2,880 narratives and evaluate their validity in conveying birth, death, hiring, and firing events. Remarkably, 87.43% of the narratives sufficiently convey the intention of the structured prompt. To automate the identification of valid and invalid narratives, we train and validate nine Machine Learning models on the classified datasets. Leveraging these models, we extend our analysis to predict the classifications of the remaining 21,120 narratives. All the ML models excelled at classifying valid narratives as valid, but experienced challenges at simultaneously classifying invalid narratives as invalid. Our findings not only advance the study of LLM capabilities, limitations, and validity but also offer practical insights for narrative generation and natural language processing applications.
- North America > Puerto Rico > Peñuelas > Peñuelas (0.04)
- North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.04)
- North America > United States > Virginia > Suffolk (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data
Large Vision-Language Models (VLMs) have demonstrated impressive performance on complex tasks involving visual input with natural language instructions. However, it remains unclear to what extent capabilities on natural images transfer to Earth observation (EO) data, which are predominantly satellite and aerial images less common in VLM training data. In this work, we propose a comprehensive benchmark to gauge the progress of VLMs toward being useful tools for EO data by assessing their abilities on scene understanding, localization and counting, and change detection tasks. Motivated by real-world applications, our benchmark includes scenarios like urban monitoring, disaster relief, land use, and conservation. We discover that, although state-of-the-art VLMs like GPT-4V possess extensive world knowledge that leads to strong performance on open-ended tasks like location understanding and image captioning, their poor spatial reasoning limits usefulness on object localization and counting tasks. Our benchmark will be made publicly available on this website and on Hugging Face for easy model evaluation.
- North America > United States > Massachusetts (0.14)
- North America > United States > California (0.14)
- North America > United States > Pennsylvania (0.13)
- (5 more...)
- Transportation > Infrastructure & Services (1.00)
- Transportation > Air (1.00)
- Leisure & Entertainment > Sports (1.00)
- (9 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)